16 research outputs found
Publishing Microdata with a Robust Privacy Guarantee
Today, the publication of microdata poses a privacy threat. Vast research has
striven to define the privacy condition that microdata should satisfy before it
is released, and devise algorithms to anonymize the data so as to achieve this
condition. Yet, no method proposed to date explicitly bounds the percentage of
information an adversary gains after seeing the published data for each
sensitive value therein. This paper introduces beta-likeness, an appropriately
robust privacy model for microdata anonymization, along with two anonymization
schemes designed therefor, the one based on generalization, and the other based
on perturbation. Our model postulates that an adversary's confidence on the
likelihood of a certain sensitive-attribute (SA) value should not increase, in
relative difference terms, by more than a predefined threshold. Our techniques
aim to satisfy a given beta threshold with little information loss. We
experimentally demonstrate that (i) our model provides an effective privacy
guarantee in a way that predecessor models cannot, (ii) our generalization
scheme is more effective and efficient in its task than methods adapting
algorithms for the k-anonymity model, and (iii) our perturbation method
outperforms a baseline approach. Moreover, we discuss in detail the resistance
of our model and methods to attacks proposed in previous research.Comment: VLDB201
Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine
Load balancing, operator instance collocations and horizontal scaling are
critical issues in Parallel Stream Processing Engines to achieve low data
processing latency, optimized cluster utilization and minimized communication
cost respectively. In previous work, these issues are typically tackled
separately and independently. We argue that these problems are tightly coupled
in the sense that they all need to determine the allocations of workloads and
migrate computational states at runtime. Optimizing them independently would
result in suboptimal solutions. Therefore, in this paper, we investigate how
these three issues can be modeled as one integrated optimization problem. In
particular, we first consider jobs where workload allocations have little
effect on the communication cost, and model the problem of load balance as a
Mixed-Integer Linear Program. Afterwards, we present an extended solution
called ALBIC, which support general jobs. We implement the proposed techniques
on top of Apache Storm, an open-source Parallel Stream Processing Engine. The
extensive experimental results over both synthetic and real datasets show that
our techniques clearly outperform existing approaches
Privacy-Preserving Data Publication for Static and Streaming Data
Ph.DDOCTOR OF PHILOSOPH
Discrete Particle Swarm Optimization Routing Protocol for Wireless Sensor Networks with Multiple Mobile Sinks
Mobile sinks can achieve load-balancing and energy-consumption balancing across the wireless sensor networks (WSNs). However, the frequent change of the paths between source nodes and the sinks caused by sink mobility introduces significant overhead in terms of energy and packet delays. To enhance network performance of WSNs with mobile sinks (MWSNs), we present an efficient routing strategy, which is formulated as an optimization problem and employs the particle swarm optimization algorithm (PSO) to build the optimal routing paths. However, the conventional PSO is insufficient to solve discrete routing optimization problems. Therefore, a novel greedy discrete particle swarm optimization with memory (GMDPSO) is put forward to address this problem. In the GMDPSO, particle’s position and velocity of traditional PSO are redefined under discrete MWSNs scenario. Particle updating rule is also reconsidered based on the subnetwork topology of MWSNs. Besides, by improving the greedy forwarding routing, a greedy search strategy is designed to drive particles to find a better position quickly. Furthermore, searching history is memorized to accelerate convergence. Simulation results demonstrate that our new protocol significantly improves the robustness and adapts to rapid topological changes with multiple mobile sinks, while efficiently reducing the communication overhead and the energy consumption
Efficient tree pattern queries on encrypted XML documents
Outsourcing XML documents is a challenging task, because it encrypts the documents, while still requiring efficient query processing. Past approaches on this topic either leak structural information or fail to support searching that has constraints on XML node content. In addition, they adopt a filtering-and-refining framework, which requires the users to prune false positives from the query results. To address these problems, we present a solution for efficient evaluation of tree pattern queries (TPQs) on encrypted XML documents. We create a domain hierarchy, such that each XML document can be embedded in it. By assigning each node in the hierarchy a position, we create for each document a vector, which encodes both the structural and textual information about the document. Similarly, a vector is created also for a TPQ. Then, the matching between a TPQ and a document is reduced to calculating the distance between their vectors. For the sake of privacy, such vectors are encrypted before being outsourced. To improve the matching efficiency, we use a k-d tree to partition the vectors into non-overlapping subsets, such that non-matchable documents are pruned as early as possible. The extensive evaluation shows that our solution is efficient and scalable to large dataset